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_ ■ Abstract 

We consider a multi-object detection problem over a sensor network (SNET) with limited range multi-modal 
sensors. Limited range sensing environment arises in a sensing field prone to signal attenuation and path losses. 
The general problem complements the widely considered decentralized detection problem where all sensors observe 
the same object. In this paper we develop a distributed detection approach based on recent development of the false 
discovery rate (FDR) and the associated BH test procedure. The BH procedure is based on rank ordering of scalar test 
statistics. We first develop scalar test statistics for multidimensional data to handle multi-modal sensor observations 
and establish its optimality in terms of the BH procedure. We then propose a distributed algorithm in the ideal case 
^ of infinite attenuation for identification of sensors that are in the immediate vicinity of an object. We demonstrate 

communication message scalability to large SNETs by showing that the upper bound on the communication message 
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complexity scales linearly with the number of sensors that are in the vicinity of objects and is independent of the total 
number of sensors in the SNET. This brings forth an important principle for evaluating the performance of an SNET, 
namely, the need for scalability of communications and performance with respect to the number of objects or events 
in an SNET irrespective of the network size. We then account for finite attenuation by modeling sensor observations 
Q\ . as corrupted by uncertain interference arising from distant objects and developing robust extensions to our idealized 

distributed scheme. The robustness properties ensure that both the error performance and communication message 
Qs, . complexity degrade gracefully with interference. 

o 
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O 1 Introduction 
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The design and deployment of sensor networks (SNET) for distributed decision making pose fundamental challenges 
due to energy constraints and environmental uncertainties. While power and energy constraints limit collaboration 
among sensors nodes, some form of collaboration is necessary to overcome uncertainty and meet reliability requirements 
of the decision making process. 

In this paper we focus on the problem of distributed detection of localized events, sources or abnormalities (from 
here on objects), observed simultaneously over different sections of a large sensor network. Such problems arise nat- 
urally in many settings such as environmental monitoring, species distribution and taxonomy, and wide area surveil- 
lance [17, 20]. The common thread in all of these applications is that the objects are not observed by all the sensors in 
the SNET. Rather, each object is in the field-of-view of only a small subset of the sensors in the SNET. We consider all 
such problems to be local information problems, and seek to devise a distributed detection strategy that satisfies certain 
false alarm and communication cost constraints. 

It is worth contrasting local information problem with its global counterpart. In a global information problem a 
single object is observed across the entire network (see Figure Q] for an illustration of local and global information 
problems). This type of problem has been extensively studied in the literature in the context of decentralized/distributed 
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detection theory [6,21,24,26,28,30]. In the centralized version of the problem one seeks asymptotically the optimal 
exponent at which the error probability goes to zero as a function of the observations [23,27]. The decentralized version 
involves a similar problem with quantized observations [28,30]. Motivated by network topologies researchers have also 
investigated several architectures ranging from fusion centric to ad-hoc consensus based approaches [3,6, 18,21,22, 
24,26,28-30]. Local information problems and corresponding decentralized algorithms have only recently begun to 
be addressed in the SNET setting [12-15]. A fundamental difference between local and global information problems 

o o 




(a) Global information problem (b) Local information problem 



Figure 1 : In global information problems the sensors observe a single global phenomenon, which leads to a binary hypotheses 
testing problem with multiple observations. In local information problems only a subset of the sensors observe a number of 
phenomena, which leads to each sensor having its own set of hypotheses. This leads to a multiple hypotheses testing problem. 

appears even in the centralized scenario. In the local information case, since each object is in the field-of-view of at 
most a constant number of sensors the error probability cannot be made to go to zero. Furthermore, since there are 
a multiple locations each location has to be simultaneously tested for presence or absence of objects. In these cases 
neither the total number of objects in the sensor field nor the likelihood of finding an object in a specific location is 
known a priori. It turns out that in these cases the error probability is dominated by the multiple tests (one for each 
location) and this issue is referred to as multiple comparisons testing in the statistical literature [4, 19]. A fundamental 
difference is in what performances are typically characterized. While, for global information problems, the asymptotics 
of the error probability for a single object with increasing number of sensors (and quantized observations) is usually 
derived, for local information problems, the scalability of error rates and communication costs with increasing number 
of objects is characterized. An important aspect of our work is to show that both of these quantities, namely the error 
rates and communication costs, scale with the number of sensors that are in the immediate vicinity of objects rather 
than the size of the SNET. 

We present a distributed detection scheme for local information problems based on the concept of false discovery 
rate and the associated BH procedure [4]. The BH procedure relies on rank-ordering of test statistics. In several 
SNET scenarios multi-modal sensors are employed, which generate multi-dimensional sensor observations, where rank 
ordering is unclear. We also consider a sensing field with signal attenuation and path losses, which essentially imposes 
an effective sensing range for the sensors. 

To the best of our knowledge multi-dimensional settings in the context of FDR have not been subjected to sig- 
nificant attention since it is generally difficult to rank-order the observations. Recent statistical work in [7] proposes 
a coordinate-by-coordinate ordering but this generally leads to sub-optimal error performance. To account for multi- 
dimensional observations we devise a transformation that maps multidimensional observations to scalar test statistics, 
which turns out to have optimal error performance. These scalar statistics then forms a basis for a distributed detection 
scheme. We show that the communication cost of the scheme scales linearly with the number of sensors that observe 
an object, and not the number of sensors that are in the SNET. Furthermore, the proposed scheme guarantees detection 
performance of centralized procedures. Next, we account for signal attenuation and path losses by modeling sensor ob- 
servations as corrupted by uncertain interference resulting from unknown objects that are not in the immediate vicinity 
of the sensor. The interference can be modeled as a perturbation to the nominal observed distributions and we establish 
robustness of our test statistic to such perturbations. 

The organization of the paper is as follows: in Section|2]we discuss the connection of distributed detection of local- 
ized phenomena to the multiple hypotheses testing problems considered in the statistical literature. We discuss possible 
performance criteria in detail and present the reasoning behind our choice. We also discuss the main contributions of 
this work in that section. In Section [3] we discuss the setup of our problem and describe ideal and non-ideal sensing 
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models. In Section [4] we propose a test statistic formulation, and discuss its important properties. In Section [5] we 
present the distributed detection algorithm, and examine its scaling properties. We also show here that the distributed 
algorithm is equivalent to its centralized counterpart with high probability. We then show, in Section |6j certain robust- 
ness properties of the test statistics to uncertainties in the distribution of observations. We also show that our choice 
of performance criterion scales gracefully with the perturbation of the distribution of observations. These results allow 
us to address cases where we do not have the exact distributions. In Section |7] we present simulations and show that 
the chosen method is able to meet the Bayes Oracle error performance. In this section we also present the scaling of 
communication costs and discuss some interesting results. We finally present our concluding remarks in Section [8] 

2 Discussion of Performance Metrics and Contributions 

In this section we will propose different criteria and present empirical evidence for adopting the Benjamini-Hochberg 
(BH) procedure, which is associated with the false discovery rate(FDR) criterion, as a basis for local information 
problems. Local information problems invariably turn out to be multi-comparison test problems. There is currently 
no consensus around a universally applicable performance metric for these problems. In the literature, location-by- 
location Neyman Pearson (NP) tests, Family-Wide-Error (FWER, also known as Bonferroni criteria) tests, average 
error probability and false discovery rate have all been proposed. Rather than discuss merits of the different criteria 
we describe their performance in terms of average errors for our context, wherein both the object density as well as 
observed distributions may only be partially known. The NP tests and FWER criterion are non-adaptive decision rules 
(i.e. threshold rules which do not depend on observed realization). Generally these methods result in poor performance 
in terms of the number of false alarms and missed detections. It turns out that the BH procedure, in contrast, is an 
adaptive rule which adapts to the observed realization and generally results in good error performance. 

To be concrete, consider a set of m sensors, S. Associated with each sensor s S S, there is a null or alternative 
hypothesis H s € {Hq, Hi} corresponding to whether or not the sensor observes an object of interest. Sensor s 
generates an observation X s £ R independently (of other sensors) with probability density gQ S if H s = Hq; and g\ s 
if H s = H\. The general problem involves situations where the actual number of objects are unknown and due to 
path losses and multi-path effects the distributions go s , g\ s are only partially known. To analyze different strategies 
we denote u{x\, X2, ■ ■ ■ , x m ) to be any decision rule that selects a set of sensors S\ = {s\, S2, ■ ■ ■ , sr} and assigns 
to them the alternative hypothesis, i.e., H s = Hi, s G Si and Hq otherwise. The outcome of a decision rule can be 
summarized in the following table. Here R is the total number of sensors identified with objects. V is the number of 
sensors falsely placed into Si, i.e., number of false alarms. Obviously, we desire both V and T to be small, and seek a 
decision rule that makes this possible. 
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2.1 Controlling False Alarm Probability: Non- Adaptive Strategies 

First consider the situation when the distributions gQ S , gi s are known but mi is unknown and arbitrary. In this case 
we can consider several possibilities, (a) Neyman-Pearson Test for each location: Maximize local detection power P D 
subject to local false alarm probability constraint, P l F < 7 for each location. The optimal decision rule for this situation 
is the well-known likelihood ratio test [27]. Although this rule is locally optimal, it is not guaranteed to provide good 
overall performance and is commonly referred to as uncorrected testing. Indeed the false alarms scale with the total 
number of sensors, i.e., E(V) ~ 0(m). (b) Bonferroni procedure [4] overcomes this issue by imposing a highly 
restrictive false alarm probability constraint on each sensor, 7' = 7/m. This strategy guarantees control of global false 
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alarm probability (i.e. Pr(V > ljj) at level 7 (follows from union bound), however this in turn leads to poor detection 
performance, i.e, a large number of misses are incurred. 

This leads to the fundamental question of whether there exist other local or global decision rules, «(•), that can 
control both the false alarm and miss probability Pr(V > 1), Pr(T > 1) (or a close relaxation such as the probability 
of k false alarms and misses for some constant k independent of m). 

The optimal decision rule for maximizing the worst-case global detection power Pjj = 1 — Pr{T > 1} subject to a 
global false alarm constraint Pp = Pr{V > 1} is generally intractable. It turns out that the worst-case false alarm and 
miss probability can be bounded from below by an entropic term which is a function only of the local SNR. 

Theorem 2.1 Let S be a set ofm hypotheses tests, H s £ {Hq, Hi} the hypothesis for test s £ S, and X s the observa- 
tion for the test s G S. Suppose, u(Xi, X2, ... , X m ) is any strategy that maps the observations to hypothesis decisions. 
Then, 

7™ = min max (Pr\V > 1 I \H S : s £ S}} + Pr\T > 1 I \H S : s G S}\) > $(H S I X s ) 

where | •) is the conditional entropy computed with a Bernoulli prior with probability 1/2 over H\s and Hq's 
over the m tests. It follows that there exists no decision strategy for which both false alarm and miss probability can 
simultaneously be smaller than <&(H S \ X s )/2. 

Proof: See Appendix. I 
Remark: H s is a binary random variable and so its entropy (or conditional entropy) is always smaller than one. 
Nevertheless, depending on measurement noise at each sensor, ${H S \ X s ) could be arbitrarily close to one. 
Remark: We can generalize this result to lower bound probability of k false alarms and misses as well using gener- 
alized Fano bounds we developed in [2]. Based on those results it follows that the probability does not improve unless 
we let either V or T grow with m. 

The above discussion brings to light the fact that non-adaptive decision rules lead to poor performance. 



2.2 Adaptive Strategies 

To establish performance of adaptive strategies, i.e., strategies that adapt to the specific realization, we need lower 
bounds on error performance. We do this by means of a Bayes Oracle where the distributions go s , g\ s as well as 
the likelihood probability of finding an object in the vicinity of a sensor is known (alternatively, we can consider 
situations where the total number of objects are known). Define, the average ratio, mi/m as the object density and the 
complement, namely, average of tuq /mas the sparsity level. Under this scenario it is easy to see that a thresholding 
decision at each sensor is optimal, and the optimal threshold is a function of the object density and the distributions 
under each hypothesis. Furthermore, the error performance of the Bayes Oracle is a lower bound on the achievable error 
probability. 

The question therefore arises as to whether there exists a procedure that achieves Bayes Oracle bound for local 
information problems, and does not depend on the knowledge of the object density and precise knowledge of distribution 
under presence of an object. This is particularly relevant since path losses and attenuation are not precisely known. 
Motivated by these issues Benjamini & Hochberg [4] formulate the false discovery rate (FDR) criterion and provide 
a distribution invariant algorithm, the Benjamini-Hochberg (BH) procedure, that controls FDR. An interesting result 
presented in [1] shows that controlling FDR can asymptotically result in asymptotic minimax optimality of the error 
probability. The FDR [4] framework seeks to control the worst case expected ratio of V/R, i.e. 

FDR = iu ,5"^ E{V/R\{H s }ses} 



'Strictly speaking, we should write max flJe { H()jHl } Pr{V > 1 | {if s } se ,s} to denote that we are looking at worst-case probability. However 
we avoid this cumbersome notation whenever clear from context. 
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(a) Example definition of test 
statistic transformation 



(b) Ordered test statistics and 
threshold line 



Figure 2: BH procedure 

where S is the set of sensors and H s is the hypothesis at sensor s. For simplicity of notation, from here on we will 
write FDR = E{V/R} and Pr{- | {H s } s ^s} = Pr{-} whenever it is clear from the context. It is easy to show that, 
FDR < Pr{V > 1}, which implies that FDR is a relaxation of global false alarm probability. 

FDR can be controlled using the so called BH procedure, which we briefly explain here. As depicted in Figure [2] 
the test statistics are computed from the observations. The test statistic of an observation is obtained through any (non- 
unique) transformation that generates a uniform distribution, U[0, 1], under null hypotheses. The test statistics are then 
rank ordered and a desired FDR threshold, 7, is chosen. Let yu\ be the i th smallest test statistic. The largest index i ma x, 
such that < ^7 is chosen as the decision point, and the test statistics whose rank indices are smaller than z max 
are labeled significant, i.e., mapped to alternative hypotheses. The BH procedure ensures that FDR < 7 for a desired 
threshold 7, regardless of how the observations under Hi are generated. 

Thus the BH procedure [4, 5] is an adaptive thresholding procedure and the final stopping point is itself a random 
variable [16] and depends on the specific realization. Nevertheless, it can be shown that the BH procedure [4] is a 
distribution invariant algorithm (i.e., regardless of g\ s ) controls FDR below 7. 

Theorem 2.2 For independent test statistics under null hypothesis, and for any configuration of alternative hypotheses, 
the BH procedure controls the FDR at level jtjiq / in, where mo is the number of true null hypothesis and m is the 
number of observations. 

For our purposes error performance of the BH procedure is of relevance. In [16] it is shown that the BH procedure 
achieves the Bayes Oracle performance for reasonable signal-to-noise ratio and low-levels of target density even though 
the distribution under H\ (they impose weak conditions on g\ s ) as well as actual number of objects maybe unknown. A 
related result in [30] shows that adaptive procedures such as the BH procedure outperform fixed threshold procedures. 
As seen in Figure |3l when we control the FDR criterion, the error rate closely tracks the error performance of the Bayes 
Oracle risk policy. In conclusion, the above exercise shows that adaptive procedures adapt their threshold to object 




Figure 3: Monte Carlo error rate comparison of Uncorrected Testing, Bayes Oracle, and the BH procedure for varying object 
density. 150 samples with distributions iV(0, 1) under Hq and A^O, 4) under Hi were used with varying target density. 

density in contrast to non-adaptive procedures. In addition they have inherent robustness properties that can be useful 
in our SNET setting. With this justification we adopt the FDR framework for our problem. 
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2.3 Contributions 

First, although BH procedure ensures FDR control irrespective of the observed distribution under presence of object, the 
detection performance can vary widely for different distributions. In particular, the reference distribution corresponding 
to the null hypotheses can be transformed in several ways to obtain test statistics without changing the FDR control. 
This leaves room for optimizing detection performance. The problem is particularly acute in multi-dimensional settings 
since it is difficult to rank-order the observations. BH procedure works on scalar test statistics, where there is a clear 
ordering. In [7] a coordinate-by-coordinate ordering is presented but this generally is sub-optimal. Our first contribution 
is to present a transformation that maps multi-dimensional observations to scalar test statistics and enables rank-ordering 
of the observations and application of BH procedure. We show that our test statistics are optimal in the sense that it 
leads to maximal detection power for a given level of FDR control. 

A second contribution of our work is to show that our transformations and procedures are robust to perturbations 
of the distribution of observations. This is particularly important in the SNET setting due to path-loss effects. Indeed, 
due to complex nature of attenuations and randomness in the environment, signals from objects far away can interfere 
with signals from objects in the immediate vicinity. However, the interfering signal is unknown and this motivates 
development of robust techniques. 

Our third contribution is in developing a distributed, communication efficient BH procedure for multi-object detec- 
tion for SNETs. Our results indicate that corresponding to an FDR threshold, the communication message complexity 
grows in proportion to the actual number of sensors observing objects (significant sensors) while achieving the same 
centralized performance. Namely, the communication costs scale with event density for a pre-specified error perfor- 
mance and is independent of the network size. 



We consider a non-Bayesian setting where an unknown number of objects are distributed on a sensor field of m sensors. 
We consider a scheme in which the objects generate a signal field over the sensor network and the sensors sample the 
field at their locations. In this scheme the significant hypothesis (-Hi) for a sensor is the event that the sensor is within 
a radius do of an object, and the null hypothesis (Hq) is the event that the sensor is outside a radius do from all objects. 
We assume a sparse distribution of objects, i.e., at most a single object is allowed to be present in the immediate vicinity 
of any sensor (we will comment on how to generalize the analysis to handle multiple objects in the immediate vicinity 
in the following section). Note however that each object can be in the immediate vicinity of multiple sensors. 

We call the radius do as the effective sensing range of a sensor. This is the radius within which signal energy does not 
decay significantly. Note that this situation models both active and passive sensing scenarios. In active sensing, sensors 
transmit a waveform and the return signal undergoes path losses. In passive sensing objects radiate signal patterns, 
which undergo path losses as well. Therefore, the object being in the vicinity of a sensor or the sensor being in the 
vicinity of an object are mathematically equivalent. The observations at each sensor are multidimensional to account 
for multi-modal sensors with different modalities such as magnetic, seismic, and acoustic. 

In this work we separate the problem of what to communicate from the problem of how to communicate by assuming 
a broadcast model, wherein each sensor, once it decides on what to communicate, broadcasts that information to the 
entire SNET. The reason for separating these two problems is, given we know what we want network to compute, 
there are a number of methods that offer an efficient solution and only require communication connectivity [11, 31]. 
Consequently, communication complexity is the aggregate number of messages broadcast by the sensors. Our objective 
is a distributed decision rule, which has low communication complexity and good error performance. 

3.1 Mathematical Modeling of Multi-Modal Sensor Observations 

We begin by considering an example of the following sensing model: 



3 Setup 




(i) 



6 



where 6 t is the multidimensional signal (possibly random with known distribution) of object t, d(s, t) is the distance 
between sensor s and an object t. The minimum distance d m i n is the distance below which the path loss model does 
not hold and the signal saturates. The model above (with the d m i n and one in the denominator) is a simplified model 
to account for both near field and far-field effects and ensures that the received signal power is not larger than radiated 
power, whenever the object is in the close vicinity of a sensor. The parameter, a, is the power decay exponent for the 
path loss, v s is the multidimensional noise variable of known distribution. 

Note that each sensor can consist of multiple modalities such as Electromagnetic (EM), Acoustic(AC) and Seismic 
(SE) etc. Thus, with ' denoting transpose, the parameter (possibly random with known distribution) 9 t above can be 
decomposed as 

e t = {ef M ,et°,e s t E y 

Note that d m i n can be also be different for different modalities, however for simplicity of notation we have assumed that 
it is the same for all modalities. With this observation model, we have the hypotheses as follows. Note that observations 
at each sensor are conditionally independent when conditioned on the underlying hypothesis. 

H s = H Q : d(s,t) > d for all objects t 
H s = H\ : d(s, t) < do for an object t 

The distance do is typically chosen to be the distance where the signal power relative to noise power is sufficiently large. 
Therefore, in general do is close to d m j n for large attenuation coefficients (a). 

For simplicity of exposition we assume in this paper that only one object can be present within the distance do. 
This is usually satisfied when we have a sparse objects distributed in the sensing field. However, we briefly discuss 
how these techniques can be generalized to handle multiple objects within do- We point out that multiple objects can 
be incorporated by using an extended hypothesis space along the lines of [8]. There are two cases here to consider: (a) 
Multiple objects lead to sufficiently different signal patterns; (b) Multiple objects do not lead to sufficiently different 
discrimination. In the first case the hypothesis space can be expanded to account for multiple objects. The main idea is 
to have multiple null and significant hypotheses for each sensor, s. The kth null hypothesis, H^q at sensor s corresponds 
to the hypothesis that there are less than k objects in the vicinity of the sensors, while the kth significant hypothesis, 
Hjii, corresponds to the hypothesis that there are exactly k objects in the vicinity. For observations distributed as 
exponential random variables, the generalized maximum likelihood test statistics are independent conditioned on the 
null hypothesis. This fact is sufficient to apply Theorem 12.21 and quantify performance of BH procedure. Thus this idea 
can be integrated with the distributed sensors to form a expanded hypothesis set, which meets the conditions required 
of BH procedure. For the second case when multiple objects do not result in sufficiently different signal patterns the 
robustness techniques developed in the paper apply. 

In summary, hypotheses are associated with each sensor. Hypothesis Hi at sensor s corresponds to existence of an 
object within a radius do of a sensor, while hypothesis, Ho corresponds to no object within radius do- An important 
point to note here is that each object can be in vicinity of multiple sensors and so the hypothesis Hi at multiple sensors 
can result from the same object. 

Ideal Sensing Model(Case of Infinite Attenuation): Note that as the attenuation coefficient a gets larger, the distri- 
bution of observations takes a nominal form, where within a radius d m j n of an object t, the received signal has a mean 
t , (if t is non-random parameter) and outside this radius the received signal has negligible mean. Thus sensors within 
the minimum radius, d m i n of the object receive the full signal power, while sensors outside this radius receive negligible 
power. This leads to the following simplified model: 

H s = Ho : X s = u s 

H s = Hi:X s = O t + u s 

Throughout the paper we refer to the first model as the non-ideal sensing model, and the limiting case as the Ideal 
Sensing Model. Figure HJillustrates these models for a hypothetical mode. 
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(a) Ideal sensing model 



(b) Non-ideal sensing model 



Figure 4: Ideal and non-ideal sensing models: In ideal sensing model the objects have constant signal over a confined region, 
whereas in the non-ideal sensing model the signal decays and is not confined. 

These examples can be abstracted to a more general formulation, where the noise is no longer modeled as additive 
and 6 t can belong to an arbitrary distribution. The cumulative probability distribution (resp. density) function of the 
observation vector X s at sensor s under each hypothesis H s = Hi, i = 0, 1 is denoted by Gi S (-) (resp. where 
H s denotes the hypothesis at sensor s. In the non-ideal case we have two composite family of continuous distributions, 
9is{-) £ Qii i = 1, 2. Observations at each sensor is conditionally independent when conditioned on the underlying 
hypothesis. The observations are denoted by a vector X s = (X^X 2 , . . . ,Xg), s G S, where d is the number of 
dimensions, X J S represents the j th dimension of the measurement taken by sensor s G S, and S represents the set of 
sensors that form the SNET. The realization of observation vector X s is denoted by x s = (x l s ,x 2 s , . . . , xf), s G S. 



We let Sq = {s G S : H s = Hq} with cardinality mo and S\ = {s G S : H s = Hi} with cardinality mi. Here both 
mo and mi are unknown and the object locations are assumed to be arbitrary, i.e. not necessarily uniformly distributed. 

Note that the class of distributions Go s and Gu are singletons in the ideal model, regardless of the dimensionality 
of the observed signal, if we assume that no two objects are within the radius d m i n of a single sensor. As we discussed 
earlier, this is usually true for sparse set of objects in a sensing field. Our approach is to develop results first for the ideal 
sensing model, where the families of distributions under the two hypothesis are characterized by singletons. We deal 
with the more general case of a < oo from a robustness perspective, i.e., as a perturbation of the ideal sensing model, 
in the upcoming parts of the paper. 

4 Proposed Test Statistics 

In this section we describe the proposed statistic and establish some important properties. We show that these test 
statistics can be used to perform detection through BH procedure, and allow for control of FDR at desired levels. 
With our definition, the CDF of test statistics under significant hypotheses becomes a concave function. Based on this 
concavity property, we can devise a scalable distributed procedure that achieves the detection power of its centralized 
counterpart. In the remainder of this section we propose a definition of test statistics. The test statistics transform 
multi-dimensional observations to scalar statistics and are based on volumes of level sets of the likelihood ratio function 
(more precisely the Radon-Nikodym derivative). These test statistics result in: 

(a) The test statistics under null hypotheses are uniformly distributed in [0, 1], 

(b) The test statistics under significant hypotheses are "maximally" clustered around zero. Consequently, thresholds 
near zero lead to detections with relatively few false alarms. 

To this end, let // Us and n\ s be the measures associated with the distributions Gq s and G\ s respectively. We assume 
throughout this work that n\ s is absolutely continuous with respect to /zn s , denoted fj,i s << fio s . Let <p s = dn\ s /d^iQ S 
be the Radon-Nikodym derivative, i.e. the likelihood ratio function. Define the following transformation of the random 
variable X s from n dimensional space onto the one dimensional space: 



H 



s 



Hq: X s ~ g 0s G Go 
H i : X s ~ gi s G Gi 



H s 



Y s = Xspt s ) = // 0s {x : ^ s (x) > S (X S )} = /U 0s {x : <p 8 (x) > </> s (X s )} 



(2) 
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where we assume that 4> s is nowhere constano The nowhere constant assumption holds for example when the involved 
distributions are Gaussians with different means or different variances. In this definition, the set {x : (/> s (x) > S (X S )} 
is the most powerful decision region such that the probability of false alarm is less than some 70 ; i.e., it is the solution 
to arg max^ p,i s (A) subject to nos{A) < 70 for some 70 6 (0, 1). Similarly, let (J,u{x ■ </> s (x) > 4> S (X. S )} = 71 for 
some 7i. Then the set {x : </> s (x) > ^ S (X S )} is also the solution to argmin^ fMo s (A) subject to fi\ s (A) > 71; i.e., 
it is the volume of the so called minimum volume sets at level 71. All of these results follow from the fact that the 
Radon-Nikodym derivative is precisely the likelihood ratio function. 

We now give an example to depict graphically the impact of transformation on a one dimensional distribution. Let 
us assume that we are given two distributions, whose densities are go s and g ls as depicted in Figure [5] (a). We first 
calculate the Radon-Nikodym derivative 4> s , as depicted in Figure [5] (b). For a given X s we can now obtain S (X S ). 
We can next identify the set {x : </> s (x) > <^> S (X S )}, and obtain Y s = Xs(X s ) = /^os{x : </> s (x) > ^s(X s )}, as 
depicted in Figure [5] The same intuition holds for multidimensional observations as well. The problem of obtaining 




{x: «x) > MX,)} Y, = n„,{x: «x) > «X„» 



(a) g 0a andgi s (b) (f> s and g 0s (c) {x : s (x) > </> s (X s )} and Y a = 

Figure 5: Depiction of how to obtain the test statistic with a 1-dimensional example, 
test statistics for multidimensional random variables has received attention from various researchers. It is worth noting 
that the test statistics we propose through the transformation are distributed U[0, 1] under null hypotheses (which we 
establish in the following section). Note that transformations that map null distributions to uniform distribution is not 
unique. For example, in [7], the authors propose to obtain test statistics in a dimension-by-dimension manner, and 
in [25] a minimum volume set approach is taken. Some of these transformations are compared in Section [7] Our 
method is relatively simple to implement and guarantees optimal FDR performance in comparison. 

4.1 FDR Control Using x 

We now establish that using test statistics obtained through \ as input to BH procedure guarantees FDR control below 
any desired level. We show this by establishing that under the null hypothesis the test statistics Y s = xO^-s) are 
distributed uniformly. 

Let Yq s = Xs(X s ) ~ Fq s be the random variable when X s ~ Go s and Y\ a = Xs(X s ) ~ F\ s be the random variable 
when X s ~ G\ s . We begin by establishing that Yq s ~ U[0, 1], which implies that we can control FDR at desired levels 
through BH procedure if we use the Y s as test statistics. 

Theorem 4.1 If the derivative (j) s = dfii s /dfio s is nowhere constant, then Y s = Xs(X s ) of Equation\2\is uniformly 
distributed in [0, 1]; i.e., Y s ~ U[0, 1]. 

Proof: See appendix. ■ 

Corollary 4.2 BH procedure applied to Y s = x s (X s ) G [0, 1], s = 1, 2, . . . , m of Equation\2\controls FDR. 

2 If 4>s is constant in some regions, then the transformation can be modified to Y s — Xa(X s ) = ^ioa{x : s (x) > S (X S )} + ip, where 
tjj ~ U(0,/3) is a random variable used as a dither with amplitude f3 s — /io s {x : </> s (x) = <ji s (X s )}. This dither achieves is analogous to 
randomized decision rules used in detection theory [27]. The results developed in the paper are valid for this general case but the proofs are more 
involved. 
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As we mentioned in Section [H the main insight behind the proof of Theorem 12.21 lies in the simple fact that the test 
statistics under null hypotheses are independent and uniformly distributed under null hypothesis. Then, the corollary 
follows readily from Theorem |4.1| 

4.2 Optimality Results for Transformation x 

Here we prove that x is the optimal transformation in the sense that it maximizes the detection power of BH procedure 
subject to any FDR constraint 7. We also establish that under the significant hypothesis, the distribution of Y s = x(X s ) 
is concave, which leads to the important result that the optimal decision rule in the space of Y s is a thresholding rule. 
This result carries importance from the distributed detection perspective, as will be clear in the upcoming sections. 

The following theorem will be necessary to formalize the fact that our proposed transformation of Equation |2] leads 
to maximal detection power of BH procedure. 

Theorem 4.3 Let Z s = x s (X s ) be any test statistic obtained from the observations X s such that Z s ~ U[0, 1] under 
the null hypothesis. IfY s = Xs(X s ) of Equation^ then Pr{Y s < j s } > Pr{Z s < 7 S }, where the probability measure 
is p, s = TTfj,i s + (1 — ir)fio s for some mixture parameter tt. 

Proof: See appendix. I 
We next establish that F\ 8 , the distribution of Y\ s , is concave; i.e. the density function f\ s is monotone decreasing 
in [0,1]. This result has strong implications in terms of the detection power and scalability of the distributed algorithm. 

Theorem 4.4 If the derivative (j) s = dp,\ s /dp,Q S is nowhere constant, then F\ s , is concave. 

Proof: See appendix. ■ 
We next present an optimality result over a family of testing procedures. Suppose, u 7 (-), 7 G [0, 1] is a family of 
testing procedures such that u 7 (-) controls the false alarm at level 7. Let, A 7 be the set of observations, X s that are 
accepted as significant, i.e., 

A 7 = {X s : u 7 (X s ) = H x } 
For each observation, X s define the mapping, 

X(X S ) = inf{ 7 : X s G A 7 } G [0, 1] (3) 

7 

It is easy to check that under suitable technical conditions the mapping is a Borel measurable function and induces a 
uniform measure on [0, 1]. We then have the following theorem. 

Corollary 4.5 Let R = [0, 7] be a decision region such that ifY s G R we decide H s = H\, otherwise we decide 
H s = Hq. IfY s are obtained through transformation x(')> tne decision region R maximizes probability of detection 
subject to a probability of false alarm constraint 7 over any other family of decision rules u 7 (-) defined above. 

Corollary 14.5 1 follows immediately from Theorems 14.31 and 14.41 Recall that Fq s is a uniform distribution in [0,1], any 
set (in [0, 1]) of Lebesgue measure 7 has probability of false alarm exactly 7. Since, according to Theorem 14.41 F\ s is 
concave, its density is monotone decreasing. Therefore among the sets of length 7, R = [0, 7] carries the most mass 
under F\ s . Furthermore, as a consequence of Theorem 14.31 among all transformations that generate a uniform Fq s , \ 
maximizes F\ s {j) and the corollary follows. 

We now state an important corollary regarding the maximal detection power of BH procedure with the proposed 
test statistics. 

Corollary 4.6 The BH procedure when applied to x of Equation\2\is larger that any other transformation xf or which 
the null distribution is also U[0, 1]. 

Note that in the BH procedure, the test statistics Y s is compared against a threshold 7^, this result follows immediately 
from Corollary 14.51 
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4.3 Convexity Properties of Ordered Test Statistics 



We present another important implication of Theorem l4.4l next. We show that the expected value of ordered test statistics 
are samples of a convex function, an important property for designing the scalable distributed detection algorithm. In 
fact, it is due to this result that we can achieve the centralized performance through a decentralized method. 

Corollary 4.7 Let Y(i\,Ym\, ■ ■ ■ ,Yr m \ be the rank ordered test statistics such that Yu\ is the i th smallest ofY s , s = 
1,2, ... ,m. E(Yt{\), i = 1, 2, . . . ,m are samples of a convex function, i.e., E(Y( i \) < (E(Yr i _ 1 \)+E(Yr i+ i\))/2, i = 
2, 3, . . . , m — 1 asymptotically, as m — > oo. 

Asymptotically, the E{Y(i)) = (i/ipi + 1)) [9] where F s = irFi s + (1 — tt)Fq s for any mixing parameter n. But, 
according to Theorem 14.41 F\ s is a concave distribution. Since Fq s is uniform, F s is also a concave distribution, and 
FjT 1 is convex. Then the corollary follows. 



5 Distributed Detection Algorithm 

In this section we present the distributed detection algorithm. Our algorithm has the property that the communication 
cost scales with the number of sensors that observe an object, and not the total number of sensors in the SNET. We 
also present the equivalence of the distributed detection algorithm to its centralized counterpart, the BH procedure, by 
resorting to the switching relation [1] and Chernoff bound. First, we describe our distributed algorithm. 

Observe that the BH procedure requires ordering of test statistics. Since ordering is not cost efficient in terms of 
communications, we use a sequential method to accomplish the linearly increasing thresholding of BH procedure. For 
reasons discussed earlier we consider communication complexity to be the number of broadcast messages. 



Distributed BH Algorithm: First, each sensor obtains test statistics y s through the proposed transformation. Every 
sensor carries an indicator variable £ s (t), such that £ s (t) = 1 if sensor s has not transmitted a decision before iteration 
t, and £ s (t) = otherwise. Sensors also carry a decision variable p s (t) such that p s (t) = 1 if sensor s decides Hi, 
and p s (t) = otherwise. At iteration t each sensor has a threshold variable l(i t ) = iff/m and a bit counter countt. 
Initialize i\ = 1 and count® = 0. Then: 

1. Sensor s decides H s = Hi if y s < l(i t ) and H s = Hq otherwise. (p s (t) takes its corresponding value) 

2. s announces its decision to the network only if £ s (t)p s (t) = 1 

3. Assume r t sensors decide Hi and declare to the network. Set i t +i = it + 1 & countt = countt-i + r t 

4. If countt > it set variable t max = t 

5. If it = m or r t = label sensors that declare H s = Hi until iteration t max as observing an object and quit 
algorithm, else go to step 1 

The distributed algorithm described above leads to the same decision rule as the centralized BH procedure. However 
when there is a communication constraint of C messages, we only need to put a cap on the count variable and perform 
the distributed BH algorithm while count t < C. Observe that due to the concavity of F\ s (Theorem 14.41 ). capping the 
count variable does not increase the FDR. In addition we argue that since the expected test statistics are samples of a 
convex function(Corollary I4.7I ). capping the count variable amounts to performing distributed detection algorithm with 
a smaller threshold 7' < 7, and hence the FDR is generally smaller. Capping the count variable has an adverse effect 
in terms of fewer detections. Our main point here is to show that FDR and detection power gracefully degrades due to 
the monotonicity properties of the transformation. In other words smaller count does result in smaller detections but in 
a proportionate manner. 



Figure |6(a)| demonstrates this effect with a simple simulation study based on Monte Carlo simulations. For this 
demonstration we used mi = 300, m = 1000, go s = N(0, 1), g\ s = iV(3, 1), with the transformation of Equation [2 
We then varied C, the communication bit budget, between 20 and 920 with increments of 100, and plotted the actual 
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FDR as a function of C for various values of FDR thresholds, 7. Figure 6(a) exhibits the FDR results of this empirical 
study. We will extensively simulate error rates (false alarms and misses) in Section |7J We note here that the detection 
power decreases accordingly when we cap the count variable. It was shown in [4] that the BH procedure controls FDR 



FDR versus Bit Budget 
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Bit Budget 

(a) Effect of capping the communications (b) Effect of Learning Object Density 

Figure 6: Left: We capped the communication count t variable at {20, 120, 220, . . . , 920} for various FDR levels 7. Observe that 
the smaller the bit budget (C), the smaller the actual FDR. Notice that FDR tapers off at levels (rno/mjj. Here mo = 700, m — 
1000. Right: Threshold update through learning: The sequential estimate of mo allows for sequential update of threshold 7 for 
better detection rates. The distribution under null and significant hypotheses were N(0, 1) and N(0, 3) respectively. 

at 7777,0 /m. Depending on the unknown variable mo, there may be an inherent conservatism in this strategy. We have 
analyzed an alternative strategy in [10], wherein at each update of the countt variable an estimate of actual number of 
objects, mi, is sequentially estimated based on the number of sensors that choose H s = H\ at stage, t. The threshold 
l(i t ) is then adjusted based on the estimated object density. Our simulation results indicated that this strategy leads to a 
much better detection power, as seen in Figure [6(b)) 

5.1 Distributed BH Algorithm: Optimal Communication Cost with Centralized Performance 

We next establish an important scaling property of the BH procedure. It is due to this property that we can limit the 
communication budget with an upper bound that depends on the number of sensors that have an object within their 
sensing range, and not the total number of sensors in the SNET. 

Theorem 5.1 Let m\ = m — mo be the number of sensors with significant hypothesis. The expected ratio ofm\ to the 
number of sensors that are declared to be significant (R) is lower bounded by 1 — 7; i.e. E{m\/K\ > 1 — 7. 

Proof: We know that the BH procedure guarantees E{V/R} = E{V/ (V + Z)} < 7. Evidently Z < mi, meaning the 
number of correct detections cannot exceed the number of sensors with significant hypotheses. Then: 

E{V/(V + Z)} = 1 - E{Z/(V + Z)} E{Z/(V + Z)} > 1 - 7 

1 - 7 < E{Z/(V + Z)} < E{ mi /(V + Z)} = E{ mi /R} 

which concludes the proof of this theorem. I 
We caution the reader that the above bound does not guarantee detection performance. It only points to the fact that 
the number of eventual detections are generally smaller than the total number of actual objects. 

Equivalence to Centralized Algorithm: The communication efficiency of our distributed BH procedure is that it is 
a first-crossing procedure, in contrast to the optimal centralized BH procedure, which is a last crossing procedure. Last 
crossing procedures are inherently inefficient. Although the number of eventual detections is typically smaller than the 
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number of objects according to Theorem 15. 11 we need to aggregate data from all the sensors in general to determine last 
crossing. On the other hand, by definition, the first crossing procedures must stop before last crossing. Moreover, the 
FDR guarantee on the first crossing is bounded from above by the last crossing. Consequently, the number of detections 
(and hence the communication efficiency) in first crossing procedure must be smaller than the actual number of objects. 
Nevertheless, there may be loss in performance. Below we argue that with our proposed transformation first and last 
crossing procedures lead to same performance. This establishes optimality of the distributed BH procedure in terms of 
both communication efficiency and performance. 

In the asymptotic case the first-crossing and the last-crossing procedures are the same, and they terminate at the 
same decision point. This is because according to Corollary 14.71 the ordered test statistics are asymptotically samples 
of a convex function. However, this is not true in finite sample cases and therefore the first-crossing and the last- 
crossing procedures can have different termination points. Below we show that the distributed BH procedure achieves 
the last-crossing performance with high probability. 

Theorem 5.2 Let be the k th smallest test statistic. If E(Y^ k -p) < lk> then Pr{Y^ > decays exponentially 
fast with k. 

Proof: See appendix. ■ 
The implication of this theorem is that after a certain number of test statistics, say k, are tested against their corre- 
sponding thresholds, one can decide whether or not to continue the distributed algorithm with an exponentially small 
probability of error. 

This result further suggests presetting k tests at the beginning of the algorithm, which must be performed regardless 
of the outcome. Note, however, that k can be fixed a priori and does not depend on the size of the SNET. We next show 
that such a modification does not affect important properties of our distributed algorithm. 

Theorem 5.3 Consider the distributed detection algorithm with k preset tests for suitably large k. Then: 

(a) FDR < 7, and (b) With the expected number of messages equal to maxjfc, E( j^)} the distributed algorithm 

achieves detection power of the centralized BH procedure with high probability. 

Proof: a) We show this part by showing that the distributed algorithm is in fact equivalent to the centralized algorithm, 
and that presetting k tests affects only the communication cost. If there exists a y; t < ij/m, i > k, then k is immaterial. 
The upper bound then follows from Theorem 15.11 This is because the centralized algorithm would also map all test 
statistics less than yi to the significant hypothesis. Therefore FDR < 7 in this case. If there is no such yi, i > k, then 
the algorithm chooses the largest test statistic yj < jj/m, j < k, and maps all smaller test statistics to the significant 
hypothesis. But here from Theorem 15.21 it follows that there is no other j > k such that j/, < 27/m, % > k with high 
probability. Hence in both cases detection power of the centralized algorithm is achieved with high probability. I 

6 Robustness Properties and Non-Ideal Sensing Model 

Our development so far offers a solution to the distributed detection problem in the ideal model. Observe that since the 
families Qq s and Q\ s are singletons, we can use Gq s and G\ s to define the transformation Y s = % S (X S ), and use the 
distributed BH procedure to perform detection. We have shown that this leads to optimal detection rate under the FDR 
criterion using BH procedure, and it has important scaling properties. In the non-ideal sensing model the sensors that 
are outside the radius d m i n receive a small residual signal from the objects. Since the received signal is not known, the 
exact distribution of observations are not available, i.e., Qq s and Q\ s are no longer singletons. This leads to a deviation 
of Fo s from U[0, 1], and Fq s becomes a member of a family of distributions Tq s . Similarly, F\ s becomes a member of 
family of distributions, T\ s . 

In this section, we establish certain robustness properties of the proposed test statistics. These properties show that 
if we know Qq s and Q\ s to within e in terms of a certain distance measure, then we can identify the families Tq s and T\ s 
to within e as well. We also establish that FDR scales gracefully when the distribution of Yq s deviates from U[0, 1] by 
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e under a suitable metric. Combining these results leads to efficient distributed detection for non-ideal sensing model 
with guarantees on performance. We begin by the robustness properties of the proposed test statistics. 

Theorem 6.1 Let /j,q s , £iq s , and H\ s be three measures such that p,i s « /xq s . Let X s ~ p,Q S , Y s = Xs(X s ) = /zo s ({x : 
0(x) > 0(X S )}), andX s ~ fJ, 0s ,Y s = Xs(X s ) = /J s({x : <£(x) > 0(X)}). 7/sup A | Mos(^) - Aos(^) |< ^Os{A), 
then | F s (y s ) — y s \< e Vs where F s is the distribution ofY s = x s (X s ). 

Proof: See appendix. ■ 
Note that we can obtain the robustness properties under significant hypothesis as well. To formalize that result we 
state the theorem here and omit its proof as it is a repetition of that of Theorem [6j] with minor modifications. 

Theorem 6.2 Let p,Q S , (j,\ s , and p,\ s be three measures such that /j,i s « fiQ S . Let X s ~ /j,i s , Y s = % S (X S ) = /zo s ({x : 
0(x) >J>(X S )}), and X s ~ fin, Y s = Xs{X s ) = /U 0s ({x : 0(x) > 0(X)}). ^sup A | Hn(A) - (ii s {A) \< em s {A), 
then | F s (y s ) — F s (y s ) \< eF s (y s ) where F s is the distribution ofY s = x s (X s ) and F s is the distribution ofY s = 

Proof: Follows similarly to proof of Theorem 16. II I 
With these robustness properties of the test statistics, for continuous families Fq s such that To 3 = {Fq s : \Fo s (y) — 
y\ < ey}, we have an immediate non-asymptotic robustness result, which states that the FDR scales gracefully as a 
function of e. 

Theorem 6.3 Let Yq s have continuous distribution Fo s (y)- If\Fo s (y) — y\ < ey, the BH procedure bounds the false 
discovery rate by 7(1 + e), i.e. FDR < 7(1 + e). 

Proof: See appendix. ■ 
The robustness result stated in Theorem 16.31 presents us with an immediate modification to the distributed BH 
algorithm in order to control FDR. It suggests that if we wish to control FDR at level 7, we only need to input the 
threshold 7' = 7/(1 + e). The distributed BH algorithm with this modification can account for the non-ideal sensing 
model. Next using Theorem 16.21 we can establish a theorem parallel to Theorem [53] We omit the proof since it follows 
along the same lines as Theorem [53] The only modification is that each of the probability expressions are perturbed by 
a small amount on account of the perturbation of the underlying distributions. 

Theorem 6.4 Consider the distributed detection algorithm with k preset tests for suitably large k. Let the distributions 
satisfy the hypothesis of Theorem \6.1\ W2\ Then: (a) FDR < 7', and (b) with expected number of messages equal 
to maxjfc, F(j^hr)} the distributed algorithm achieves detection power of the centralized BH procedure with high 
probability. 



7 Simulations 

In this section we present some empirical studies based on Monte Carlo simulations on the proposed method. To 
obtain each data point in our simulation study we performed over 5000 monte-carlo iterations and found that this was 
sufficient to ensure confidence in our estimates. In this section we first show that the test statistics we propose performs 
better than other multi-dimensional transformations that have been proposed: namely, the radial transformation, the 
multidimensional counterpart of p values, as well as the method proposed in [7]. Based on this result, we choose to use 
the proposed test statistics, and show that the BH procedure achieves a near Bayes Oracle error rate whereas Bonferroni 
procedure and Uncorrected testing cannot. We then present an SNET simulation, where we vary several parameters and 
examine the error rate and communication cost. For both studies we discuss the relevant parameters and setup in the 
corresponding sections. 
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7.1 Synthetic Data with Known Distributions 

In our first study, we use the BH procedure to perform detection, and compare the error rate with that of Bayes Oracle. 
The setup is as follows: There are m = 1000 hypotheses tests, the number of objects mi is varied from 10% to 
90%. Under H s = H , X s ~ 2V(0,J 3 ) and under H s = H u X s ~ iV(1.5,J 3 ). Here I 3 denotes the 3 x 3 
identity matrix, denotes the 3x1 zero vector, and 1.5 denotes the 3x1 vector (1.5 1.5 1.5)', where ' denotes 
transpose operator. The test statistics are computed in three ways: first, we use the method proposed in this work, i.e., 
Y 8 = /ios{x : 0s( x ) > ^s(X s )}, next we use the radial transformation, i.e., Y s = /Uo s {x : <?os( x ) < go s (X s )}, and 
finally we use dimension-by-dimension transformation proposed in [7]. In the dimension-by-dimension approach, the 
test statistics are calculated for each dimension separately by using the marginal distribution. 

We next input these test statistics to the BH procedure to identify the tests where H s = H\. The FDR constraint is 
chosen to be 7 = .1. Figure [7(a)1 presents the error rates associated with each of these methods versus mi/m. Observe 
that the BH procedure with proposed test statistics comes close to the Bayes Oracle performance for high sparsity levels 
(small m\/m). Furthermore, even at low sparsity levels it achieves the smallest error rate. 




(a) Different Multi-dimensional Transformations (b) Different Testing Procedures 



Figure 7: Monte Carlo simulations for comparing Error rate vs object density for different multi-dimensional transformations and 
different decision criteria. BH procedure on the proposed test statistics dominates other transformations and testing strategies. 

With the same setup, we now use the proposed test statistics and assess the error rate of the following detection 
schemes: BH procedure, Bonferroni procedure, and Uncorrected testing. The Bonferroni procedure takes the test 
statistics as input and tests if Y s < y/m in order to decide which ones have H s = H\. Similarly, uncorrected testing 
checks if Y s < 7. We compare the error rate of these three methods with that of the Bayes Oracle for mi varying from 
10% to 90%. The results clearly indicate that the performance of the BH procedure is close to that of the Bayes oracle. 
Figure [7(b)1 demonstrates these results. Note that the performance of BH procedure is strikingly similar to that of Bayes 
Oracle at high sparsity levels (low m\/m). 

7.2 SNET Simulation with Nonideal Model 

In our next study we setup a SNET simulation to study the effects of SNET size, object density, and attenuation on 
the performance of the proposed method. We present the performance of the Bayes Oracle as a reference point when 
appropriate. 

First we set up a n x n grid of sensors, where the distance between each sensor is 4 units. Then we place a number 
of objects at randomly chosen locations over the SNET, where the possible locations are at the center of grid squares. 
Figure [8] depicts this setup with a 3 x 3 grid and one object. 
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Figure 8: A 3 x 3 sensor grid and an object. 
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where d(s, t) is the distance between sensor s and an object t, do = 2y/2 is the distance between an object and the 
nearest sensor, a is the attenuation coefficient, and v s ~ N(0, I3) is the noise. Here we have 

H s = Ho: d(s,t) > d for all objects t 
H s = Hi : d(s,t) < do for an object t 

We choose d m - m = do for the attenuation model. This choice fixes the nominal signal-to-noise (SNR) ratio at the 
sensors. In other words in the presence of a single object in the entire sensor field, the sensor in the immediate vicinity 
of the object receives a signal with the same SNR irrespective of attenuation coefficient, a. On the other hand objects 
not in the immediate vicinity suffer from path losses allowing us to study the impact of perturbations. Other choices for 
d m i n are possible, however, they lead to scaling of both interference as well as nominal signal. 

In our setup note that the second smallest distance between a sensor and an object is \/6 2 + 2 2 = \/i0 units. 
This implies that we have two candidates for nominal null distributions: (a) Nominal null distribution, go s is a normal 
distribution with zero mean and noise variance; (b) Nominal null distribution go s is a normal distribution with mean 
equal to signal received from a hypothetical object located at \/40 units. For the significant hypothesis, we always 
assume the nominal distribution, g\ s = N{6 t ,H). In our experiments we notice that error rates for two different 
nominal null distributions to be similar. The differences appeared to be in the composition of false alarms and misses. 
This is because the second assumption is conservative, i.e., a distant object is assumed even if there does not exist any 
object. Our simulations allowed more than one object in the immediate vicinity of a sensor. However, we did not notice 
any degradation in performance. 

Effect of Object Density: In our first study we have n = 25, which leads to to = 625 sensors. We choose 7 = 0.1, 
6 t = (2 2 2)', and a = 2. We then vary the number of objects such that mi/m G {-03, .06, . . . , .15}, and observe 
the error rate and communication cost of distributed BH procedure, Bonferroni procedure, Uncorrected testing, and 
Bayes Oracle. Figures [9(a)| and [9(b)| demonstrate the results of this study. Notice that the error rate of the distributed 
BH procedure again closely tracks that of the Bayes Oracle. However, while the Bayes Oracle uses the knowledge of 
mi /to, and the actual distribution of observations at each sensor, the distributed BH procedure only uses the assumed 
distributions. 

The expected proportion of communication cost to m\ remains near 1 for the distributed BH procedure, whereas 
it significantly deviates from 1 for Bonferroni procedure and Uncorrected testing. This is because, the Bonferroni 
procedure, due to its stringent threshold 7 = 0.1/625, misses most of the sensors that are within d m i n of an object. On 
the other extreme, the Uncorrected testing suffers from a large number of false alarms, which is a constant proportion 
of m, and therefore the communication cost is significantly larger than mi. 

Effect of Attenuation Coefficient: We next study how the attenuation coefficient a affects the error rates and com- 
munication costs of the competing schemes. For this setup we again have n = 25, which leads to to = 625 sen- 
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Figure 9: Monte Carlo simulations for comparing error rate vs object density for the proposed statistics in the SNET setup: the 
distributed BH procedure using the proposed test statistics achieves the minimal error rate and closely tracks the performance of 
Bayes Oracle. Here m = 625, t = (2 2 2)', u s ~ N(0, J 3 ), a = 2. 

sors, and we choose 7 = 0.1 and 6 t = (1.5 1.5 1.5)'. The object density is fixed, where mi/m = 0.1. We vary 
a £ {2, 2.2, 2.4, . . . , 4} and observe the error rate and communication cost versus a. Intuitively, as we increase the 
attenuation coefficient, the distributions g§ s = N(6 t , I3) and g\ s = N(6 t , I3) become more separable. This in turn is 
expected to decrease the error rate, and increase the detection rate. Increasing the detection rate increases the commu- 
nication costs. These are precisely the effects we observe in Figures 10(a) and 1 10(b)] 
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Figure 10: Monte Carlo comparison of error rate & communication cost as a function of attenuation coefficient (a) for different 
strategies. Monte-Carlo simulations with 625 multi-modal sensors, with 62 sensors in immediate vicinity of a target were sim- 
ulated. The parameters governing the sensing model were t = (1.5 1.5 1.5)', with v s ~ N(0, 13). The proposed distributed 
BH procedure achieves the minimal error rate and closely tracks the performance of Bayes Oracle when we use the proposed test 
statistics. As a increases, the distributions become more separable and the error rate decreases. Note that as a increases, the 
distributions become more separable, which in turn increases the detection rate and associated communication cost. 

Effect of SNET size: In our final study we examine the size of the SNET on the communication costs. What we wish to 
do is to fix the number of objects and increase the size of the SNET grid. The effect we wish to show is that for the BH 
procedure, no matter the size of the SNET, the communication cost scales with mi, the number of sensors that are in the 
vicinity of an object, and not m, the size of the SNET. For this study we fix a = 2, 7 = 0.1, mi = 60, and 9t = (2 2 2)'. 
We then vary n £ {25, 35, 45, 55}, which leads to m G {625, 1225, 2025, 3025}. Figure [TT1 demonstrates the results 
of this study. Observe that for the uncorrected testing the communication cost linearly increases as a function of the 
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SNET size, whereas the distributed BH procedure is able to retain a near constant fraction of communication cost to 
mi. Notice that Bonferroni procedure has the lowest communication cost, however this is due to the fact that detection 
rate of Bonferroni procedure is very small. 
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Figure 1 1 : Monte Carlo simulation of communication cost/m! with SNET size (m) using the proposed statistics for the SNET 
setup: as m increases, the communication cost of the Uncorrected testing increases whereas the distributed BH procedure retains 
a near constant fraction of communication costs to m\. Bonferroni procedure has the lowest communication cost, however this is 
due to the detection rate being very small. Here a = 2, 7 = 0.1, mi = 60, and t — (2 2 2)'. 

8 Conclusion 

In this paper we developed tools for detection of localized events, sources, or abnormalities within SNETs. Unlike 
decentralized detection where the information is globally available, the focus here was on problems, where only a 
small number of sensors in the vicinity of the phenomena are in the field of observation. We call these problems 
local information problems. For such problems the main difficulty arises from the coupling of: a) uncertainty in the 
number of events, sources or abnormalities and uncertainty in the possible locations; b) multiplicity of false alarms. 
Although not evident at first sight, these fundamental difficulties call for collaboration in the SNET in order to meet 
global constraints. 

We proposed FDR as a performance criterion for local information problems in SNETs. The reasoning behind 
FDR was the fact that FDR adapts to the unknown object density, which is of great importance for distributed detection 
problems. Namely, we do not know not only how many events take place at any time, but also where these events occur. 
The adaptive nature of FDR made it a very valuable tool to address these issues. 

We next introduced a transformation that maps multidimensional observations to single dimensional test statistics, 
which has important properties for distributed algorithms. Namely, asymptotically the ordered test statistics are samples 
of a convex function. This allowed us to devise a distributed BH procedure, which is a first crossing procedure that also 
has desirable scaling properties in terms of the communication costs. Namely, the communication cost of the distributed 
algorithm scales with the number of significant sensors (sensors in the close vicinity of an object), and not the whole 
SNET. We also showed that the distributed BH procedure achieves the performance of its centralized counterpart. 

We quantified robustness of the distributed algorithm and the proposed transformation to unknown perturbations in 
the nominal distribution. This issue is particularly relevant in a sensing field where the path losses and attenuation coef- 
ficients are not known. The simulation studies confirmed this assertion by demonstrating that distributed BH procedure 
tracks the performance of the Bayes Oracle in terms of the error rate, even in the non-ideal model, with communication 
costs scaling with the number of significant sensors. 
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Appendix 

Proof of Theorem HZD 

First note that from Lagrangian duality it follows that, 

7w = minmax (Pr{V > 1 | {H s : s G S}} + Pr{T > 1 | {# s : s G 5}}) > max minPr(V > l)+Pr(T > 1) 

u H a Pr{(H°):seS} u 

where, we can substitute any prior distribution for Pi{(H s ) : s G S}. Consequently, we are left to establish a bound 
for the Bayesian problem. Now we observe that the error event, 

£ = {u(X 1 ,X 2 , . . . , X m ) / {H s : a G S}} = {V > 1} U {T > 1} 

Therefore, from Fano's inequality it follows that for any strategy u(-): 

Pr(V > 1) +Pr(T > 1) >Pr(f) > —$>{H S : s £ S} \ X u X 2 , . . . , X m ) - — = <f>(H s \ X 8 ) - — 

m mm 

where $>{H S : s G S} is the conditional entropy. The last equality follows by substituting a independent Bernoulli 
prior for presence or absence of objects. I 

Proof of Theorem @J] 

Note that for any sequence <pi > <p2 > ■ ■ ■ the sets Ai = {x : </> s (x) > 4>i] form a nested sequence of sets such that 
Ai C A 2 C . . .. Then 

Pr{Y s <y s } = Pr{/i 0s {x : s (x) > S (X S )} < /i 0s {x : s (x) > s (x s )}} 

= Pr{// 0s {x : s (x) > S (X S )} < /x 0s {x : s (x) > s (x s )}} 

= Pr{{x : <£ s (x) > </>s(X s )} C {x : s (x) > s (x s )}} 

= Pr{^ s (X s ) > s (x s )} = Pr{x : s (x) > s (x s )} = ^ s{x : s (x) > s (x s )} = y s 

where the probability measure is fj^ s , the second inequality follows from the continuity of Y s , and the third equality 
follows from the fact that the sets are nested. The independence of the test statistics under null hypothesis follows from 
our conditional independence assumptions of Section [3] I 

Proof of Theorem U2 

We can write for Y s and Z s : 

Pr{n < 7s} = 7rPr{Y s < Is \ H s = Hy) + (1 - tt)Pv{Y s < ls \ H s = H } 

= TrPr{Y s <j s \H s = Hi} + (1 - vr) 7s 

Pr{^ s < 7s} = KPr{Z s < ls \ H s = H x ] + (1 - vr)Pr{Z s < 7s | H s = H } 

= 7rPr{Z s <j s \H s = Hy} + (1 - vr) 7s 

Then, to prove our result, it suffices to show that Pr{Y s < 7s | H s = H\} > Pr{Z s < 7s | H s = Hi}. To show 
this, let A Xs = {x : (/> s (x) > <pi} be the set such that /J,o s A Xa = j s . Notice that for Z s , the uniform distribution under 
the null hypothesis assumption implies Pr{z : z < 7s \ H s = Ho} = [ios{~x '■ Xs(x) < 7s I H s = Ho} = 7s . 
Write A Xa = {x : £ s (x) < 7s I H s = H }. Then, n ls A Xa = ^ ls {A Xa - A Xa ) + fi u (A Xa n A Xa ), and similarly 
fi ls A Xa = m s (A Xa - A Xa ) + nu(A Xs n A Xa ), where A - B denotes the removal of set B from set A. 

Observe that showing Pr{Y s < 7 S | H s = H\} > Pr{Z s < j s \ H s = Hi} is equivalent to showing ^\ s A Xa — 
Hi s A Xa = fi ls (A Xa - A Xa ) - n ls (A Xa - A Xa ) > 0. To show this, observe that n 0s (A Xa - A Xa ) = fj, 0s (A Xa - A Xa ) =7' 
for some 7' = 7 - ^ s(^4 Xs D A Xa ). But, over A Xa - A Xa , dfii s /d/j.o s = 4> s > 4>i, and hence fii s (A Xa - A Xa ) > <£ l7 '. 
Similarly, over A Xa — A Xa , d^i s /d^os = <fis < <j>i, and hence fii s (A Xa — A Xa ) < ^17', which implies fii s (A Xa ) — 
fii s (A Xa ) > and concludes the proof. I 
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Proof of Theorem B3I 

Again noting that for any sequence <pi > 4>2 > • • • the sets Ai = {x : s (x) > 0j} form a nested sequence of sets such 
that ii C A2 C . . ., we can write: 

F u {y.) = Vr{Y s <y s } 

= Pr{^ 0s {x : s (x) > (p s (X s )} < /i s{x : s (x) > s (x s )}} 

= Pr{^ 0s {x : (f> s (x) > S (X S )} < M0s{x : s (x) > (p s (x s )}} 

= Pr{{x : s (x) > S (X S )} C {x : s (x) > s (x s )}} 

= Pr{0 s (X s ) > s (x s )} = Pr{x : s (x) > s (x s )} = /i is {x : s (x) > s (x s )} 

Observe that here the probability measure is [i\ s , because X s is sampled with respect to G\ s . 

Now, let fa > 02 > ^3 be such that for Ai = {x : s (x) > 0J, i = 1,2,3, /ios(^l) = 2/«, /W^) = 
Vs = vl + 5 o> and fjQ S (A 3 ) = y 3 s = y 2 + S for some appropriate y\ and <5 . Also, F u (y\) = Hu{M) = z], 
Fis(Vs) = Vu{M) = z 2 s = z] + Si, and Fi a (yf) = //i s (A 3 ) = jgf = 2^ + 5 2 for some appropriate Si and <5 2 - 

Notice that Si = fJ,n(A 2 - A\) and S 2 = fJ-ui^s - A 2 ). Noting that p,Q S {A 2 - A\) = no s (A 3 - A 2 ) = S and 
noting that Ai are constructed using the Radon-Nikodym derivative, it follows that Si > 5 2 . Thus we can write 

F la (y 2 a ) - Fujyp _ Si > S 2 _ Fi s (y 3 s ) ~ gigfag) 

y 2 s - vl ^0 ~ ^0 yi - y 2 s 

But this holds true for all <pi > <ft 2 > 03 such that Sq > 0, and hence the result follows. I 
Proof of Theorem O 

Let N k = : yj < l k } = Y^=i ^{yj<h}- ^ tne switching relation (see for example [1]) the following relationship 
holds for any k: {E^^) < l k } & {E(N k ) > \^]}. Therefore, ^(y (r _A_ 1) ) < h => E(N k ) > JL and 
k < E(N k )(l - e). 

Pr{Y {k) >l k } = Pr{N k <k} < Pv{N k < E(N k )(l - e)} 

< exp { -^ } (4) 

f 2 k 

< exp{ 7-—-} (5) 

1 2(1 - e) ; 

Inequality @]follows from the Chernoff bound, and inequality [5]follows from the application of switching relation along 
with the assumption of the theorem. ■ 

Proof of Theorem 16.11 

We know from Theorem 14. II that Y s is uniformly distributed in [0, 1]. Similarly to the development of that theorem, 

Pr{Y s < y s } = Pr{/x 0s {x : s (x) > S (X S )} < ^ 0s {x : s (x) > s (x s )}} 

= Pr{/i 0s {x : s (x) > 4> S (X S )} < /i 0s {x : (f> s (x) > s (x s )}} 

= Pr{{x : s (x) > S (X S )} C {x : s (x) > s (x s )}} 

= Pr{0 s (X s ) > s (x s )} = Pr{x : s (x) > s (x s )} 

Here the probability measure is jj,Q S , since X s is drawn with respect to that measure. Then, Fo s {y s ) = P*{Y S < 2/s} = 
/ios{x : s (x) > s (x s )}. However, by hypothesis of the theorem, sup A \ Hos(A) — p,Q S (A) \< eno s {A), and 
hence | fi 0s {x. : s (x) > s (x s )} - /i 0s {x : s (x) > s (x s )} |< e/i 0s {x : <p s (x) > 4> s (x s )}. Finally noting that 
^0s{x : 0s (x) > s (x s )} = y s , we have the result. I 
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Proof of Theorem 16.31 

Define 7& = k'y/m. Let Yq s , s = 1, 2, . . . , uiq be the mo test statistics under null hypothesis. Denote with C s (k) the 
event that if Yq s mapped to H s = Hi, exactly k — 1 other test statistics are mapped to H\. Then; 

E{V/R) = E ^ Pr {^o s <7fc,C s W}= ^ E < 7fc}Pr{C s (A;)} 

s=l:mo fc=l:m s=l:mo fc=l:m 



^ E E 7j^(i + e )pr{c s (fc)} = ( i +e ) E ^ E Pr <^» 

s=l:mo fe=l:m s=l:mo k=l:m 

l = (TL + e )TH9 

m m 



(l + e) £ ± = (l + e)^<(l + e) 1 

z — J m qm 

s=l:mo 



The second equality follows because Y$ s is independent of all other test statistics. 
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